When Should Mayfield Model Data be Discarded?
نویسنده
چکیده
—Much confusion exists over the proper way to handle nest-fate data collected after the fledge date when using the Mayfield method. I provide a simple numerical example showing how use of these data can bias estimates of daily survival probability, and present a likelihood function demonstrating that nest-fate data collected after the fledge date do not contribute any information for parameter estimation, except in a seldom-realized special case. Consequently, it is recommended that under the Mayfield model, nest-fate data collected after the fledge date be discarded. Received 16 April 2004, accepted 31 July 2004. Previously, I presented a generalization of the Mayfield method (Mayfield 1961, 1975) for estimating daily survival probabilities of nests, and advocated discarding nest-fate data collected after the fledge date of a nest (Stanley 2004). The reason for this recommendation is that errors or uncertainty in determining nest fate after the fledge date, combined with decisions by the investigator as to how these data should be handled, can unnecessarily bias estimates of daily survival probability. Because this problem also manifests itself in the widely used Mayfield model for daily survival probabilities, and because (in my experience) there continues to be confusion regarding when and why such data should be discarded (Manolis et al. 2000), I present in this note a simple numerical example illustrating the problem and how it can be avoided. My goal is to bring clarity to, and increase awareness of, this issue. Let us suppose we have a population of 32 nests, each containing exactly one nestling, and that the daily survival probability (p) for those nests is 0.5 (these numbers were chosen for illustrative purposes, and are not intended to be realistic). Further, suppose that we know every nest is exactly 2 days from fledging, and that after the first day 16 nests survive and 16 1 U.S. Geological Survey, Fort Collins Science Center, 2150 Centre Ave., Bldg. C, Fort Collins, CO 80526, USA; e-mail: [email protected] fail, and after the second day 8 of the 16 remaining nests survive to fledging and 8 fail before fledging. Finally, of the eight nests that failed during the second day, assume that at four of the nests there was obvious evidence that the nest had been depredated (e.g., feathers, tissue remains), and at the remaining four nests there was no evidence (e.g., the nestling was carried off). If we were studying this population of nests and had perfect knowledge of the situation just outlined (except for p), then the likelihood function (L) under which we would estimate p would be proportional to p16(1 2 p)16p8(1 2 p)8 (Johnson 1979), and our maximum likelihood estimate of p would be p̂ 5 (16 1 8)/(32 1 16) 5 0.5. This estimate is mathematically equivalent to the usual Mayfield estimate, and is unbiased. Now consider a slightly different situation, where we have the same information as above except that we do not know the fate of every nest after the second day because when we arrived at nests they were empty. We are, however, able to correctly deduce that at least 4 of the 16 nests failed because there were feather and tissue remains and we knew the nests contained only one nestling. How should we analyze these data? I present three scenarios: Scenario 1.—Because we found no evidence to the contrary, and because the nestlings were gone from the nest, we assume the 12 nests without evidence of predation successfully fledged young. Under this assumption we get L } p16(1 2 p)16p12(1 2 p)4, and p̂ 5 (16 1 12)/(32 1 16) 5 0.583. This estimate is positively biased, because the true p 5 0.5. Scenario 2.—Because we only know with certainty that four nests failed after the second day, we only use those data in our analysis. This is equivalent to assuming there were only 4 nests at risk of predation the second day (instead of 16); therefore, L } p16(1 2 p)16p0(1 2 p)4, and p̂ 5 (16 1 0)/(32 1 4) 5 0.444. This estimate is negatively biased. Scenario 3.—Because we cannot determine 268 THE WILSON BULLETIN • Vol. 116, No. 3, September 2004 unequivocally the fate of every nest checked after the second day (the fledge date), we discard all data for nests checked after the fledge date. Under this scenario we get L } p16(1 2 p)16, and p̂ 5 (16)/(32) 5 0.5. This estimate is unbiased. Of the three scenarios presented, only the last yields an unbiased estimate of p. We were able to use data collected after the first day because we knew the nests were 2 days from fledging when they were found; hence, we knew the 16 empty nests found after the first day had to have failed. However, when we found 16 empty nests after the second day we could not be certain of the fate of every nest (only 4 of them). Consequently, it was necessary to discard all data from the second day so we would not bias our estimate. The situation above, where knowledge of nest fate is imperfect, was simplified to illustrate the main point of this note. In reality, there are likely to be some nests checked after fledging where failure or success can be determined without error. If we let r1 be the probability a nest checked after the fledge date is determined to have succeeded when it did, in fact, succeed, and let r2 be the probability a nest checked after the fledge date is determined to have failed when it did, in fact, fail, then the appropriate model for our data (continuing with the example above) is n n n n 1 2 3 4 L } p (1 2 p) (r p) [r (1 2 p)] 1 2 n5 3 [(1 2 r )p 1 (1 2 r )(1 2 p)] . 1 2 Here, n1 and n2 are the number of nests surviving or failing after the first day, n3 and n4 are the number of nests known with complete certainty to have survived or failed over the second day (i.e., known-fate nests checked after the fledge date), and n5 is the number of nests checked after the fledge date where fate could not be determined with complete certainty (in the preceding numerical example n1 5 16, n2 5 16, n3 5 0, n4 5 4, and n5 5 12). Using standard maximum likelihood methods under the assumption that r1 ± r2, it can be shown that the maximum likelihood estimate for p is p̂ 5 n1/(n1 1 n2). In other words, none of the nest-fate data collected for nests after the fledge date (i.e., n3, n4, or n5) contributes information to the parameter estimate, even though the fate of some of those nests is known with certainty. It is as if the data did not exist, or were discarded. Only in the special case where r1 5 r2 (and r1, r2 . 0) do nest-fate data collected after the fledge date contribute to the estimate of p. In that case, p̂ 5 (n1 1 n3)/(n1 1 n2 1 n3 1 n4). Because in real-world situations it will almost always be the case that r1 ± r2, and because r1 and r2 will usually be unknown (so equality can not be ascertained), it is evident that nest-fate data collected after the fledge date should almost always be omitted from analyses under the Mayfield model (i.e., scenario 3 above). Attempts to use these data in an ad hoc fashion, as was illustrated by scenarios 1 and 2 above, will only serve to bias what would otherwise be an unbiased estimate. In the material above, I have shown that even under ideal conditions—where nests are checked daily, the exact fledge date is known, and there is only one nestling per nest—nestfate data collected after the fledge date do not contribute information for parameter estimation under the Mayfield model, and, if used in an ad hoc fashion, will introduce bias. In reality, the situation is even worse than I have portrayed. In most studies, nests are not checked daily and the exact fledge date is unknown. Consequently, evidence that might indicate nest fate (e.g., the presence of nearby young, tissue remains) will have had time to disappear, and we do not know how many days passed before the nest failed or fledged. Thus, we do not know the number of ‘‘nest days’’ to credit to a nest and this can create additional bias, even in the special case where r1 5 r2. Furthermore, for many species there is often more than one nestling present and this will further complicate accurate assignment of nest fate. For example, suppose a nest contained three nestlings and that two fledged before the third nestling was taken by a predator. We would likely conclude the nest had failed, even though it actually succeeded. Once again, this can create additional bias. These real-world complications only serve to reinforce the main message of this paper, that nest-fate data collected after the fledge date— or more precisely, the predicted fledge date as determined by the investigator—should be omitted from the analysis. The Mayfield model was developed under the assumption that daily survival probability 269 SHORT COMMUNICATIONS (p) is constant, when, in reality, p is probably heterogeneous (Stanley 2000, Dinsmore et al. 2002, Stanley 2004). If p is heterogeneous, and if the interval between nest checks is long, then it is possible that discarding nestfate data collected after the fledge date will result in a loss of information about the nature of heterogeneity in p near the fledge date, and this could adversely affect robustness of the Mayfield estimator (D. H. Johnson pers. comm.). To prevent the loss of such information, investigators should make every effort to check nests more frequently as the predicted fledge date approaches. Not only will this lead to more robust estimates under the Mayfield model by decreasing the net information loss from discarded data, but it also will allow investigators to continually update the predicted fledge date so that in the end it more closely approximates the actual fledge date, thereby improving estimates.
منابع مشابه
Low-Rank Spectral Learning with Weighted Loss Functions
Kulesza et al. [2014] recently observed that low-rank spectral learning algorithms, which discard the smallest singular values of a moment matrix during training, can behave in unexpected ways, producing large errors even when the discarded singular values are arbitrarily small. In this paper we prove that when learning predictive state representations those problematic cases disappear if we in...
متن کاملPresenting a Hybrid Approach based on Two-stage Data Envelopment Analysis to Evaluating Organization Productivity
Measuring the performance of a production system has been an important task in management for purposes of control, planning, etc. Lord Kelvin said :“When you can measure what you are speaking about, and express it in numbers, you know something about it; but when you cannot measure it, when you cannot express it in numbers, your knowledge is of a meager and unsatisfactory kind.” Hence, manag...
متن کاملEstimation of Body Weight from Heart Girth in Sardi and Timahdite Sheep Using Different Models
The objective of this study was to determine the relationship between body weight (BW) and heart girth (HG) in Sardi and Timahdite sheep in order to develop a prediction equation of BW from HG. The data used for this study included 476 records on BW and HG (227 in Sardi and 249 in Timahdite) collected on males and females of different ages in 33 private farms. The BW and the HG averaged 34.8 ± ...
متن کاملA fault tolerance routing protocol considering defined reliability and energy consumption in wireless sensor networks
In wireless sensor networks, optimal consumptionof energy and having maximum life time are important factors. In this article attempt has been made to send the data packets with particular reliability from the beginning based on AODV protocol. In this way two new fields add to the routing packets and during routing and discovering of new routes, the lowest remained energy of nodes and route tra...
متن کاملBenefitting from the Variables that Variable Selection Discards
In supervised learning variable selection is used to find a subset of the available inputs that accurately predict the output. This paper shows that some of the variables that variable selection discards can beneficially be used as extra outputs for inductive transfer. Using discarded input variables as extra outputs forces the model to learn mappings from the variables that were selected as in...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2004